Automatic modification of web pages

ABSTRACT

Systems and methods for quickly and easily getting information about, or included in, a paper document into a public or private digital page. One embodiment of an example system includes a scanner that generates scan information from at least a portion of a paper document and a processing system that receives the generated scan information from the scanner, accesses a database of digital documents, searches the database based on the received scan information, locates a digital document corresponding to the paper document, and sends either the digital content or a hyperlink to the digital content to a predetermined web page.

BACKGROUND

The use of printed books and documents (hereafter referred to as simply“documents”) has been commonplace for many hundreds of years. Over thecenturies, various tools and strategies have evolved to try to make moreeffective use of printed documents. These range from handwritten (ortyped) notes on the contents of documents (either on the document itselfor in a separate but related document), to highlighting passages in adocument deemed to be of greater significance, to manually copyingpassages from a document (or using a scanning copier, despite the factthat copyrights are often so infringed), to the simple act of includinga printed index at the end of a document to facilitate locatinginformation on a specific topic. Many new tools and strategies areneeded now that a document can be accessed in an electronic, searchableformat such as a file on a local computer or a web page that can beaccessed with a browser.

The relatively recent innovation of providing a searchable electroniccopy of a document that can be accessed using a standard personalcomputer is quite powerful in increasing the ease with which the desiredcontents can be accessed and utilized. When a traditional index isprovided in such a context, once an entry is found, a single click ofthe mouse can take the user directly to the desired entry in theelectronic text. Once a relevant entry has been found, its location canbe retained as a “bookmark” and filed according to the user's choice,making future access to the location in the electronic document quickand easy.

It is a problem that these very useful tools for working with electronicdocuments cannot be used with the vast existing reserve of printed booksand documents. Even though there are tremendous advantages that accruewith access to an electronic version of a document, these are obviouslyonly available when such an electronic version is available (and acomputer is available to access the electronic document). Even in thoseinstances where such an electronic version is available, this still doesnothing to enhance the actual use of the paper document itself.Furthermore, when newer revisions and updated versions of either thepaper or the electronic version of a document become available, theowner of a previous version generally has little recourse but to go andpurchase a new, updated copy of the material.

As is well known in the art, by using traditional methods for documentprocessing (such as, for example, a flatbed scanner combined withappropriate computer software for optical character recognition), a usercan create an electronic version of a paper document. However, inaddition to the fact that such a task is laborious, time-consuming, andgenerally error-prone, it usually involves infringement of the copyrightheld by the author of the textbook in question. Further, even when anelectronic version of a document is thus created, it is still subject tothe limitations mentioned above—a computer is required to make any useof the additional features offered, and no additional utility isprovided for the paper document itself. Despite the prevalence ofcomputers, and despite the advantages conveyed by searchable electronicversions of documents, the continued widespread preference for creatingand using paper documents is a clear indication of how attractive theyremain to the average user. The portability, convenience, ease ofviewing, and even the “feel” of paper documents clearly retain apowerful appeal to most individuals.

Therefore, there exists a need to allow users to easily navigate betweenprinted and electronic versions of documents. There also exists a needto allow users to easily access electronic document versions or links ofa paper document or article.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a system diagram illustrating a typical environment in whichembodiments of the system operate.

FIG. 2 is a diagram of an example scanner used in the system shown inFIG. 1;

FIG. 3 is a flow diagram of an example process performed by the systemof FIG. 1;

FIG. 4 is a diagram of an example paper document scanned by the scannerof FIG. 2; and

FIG. 5 is a display of an example web log produced by the system of FIG.1.

DETAILED DESCRIPTION

Systems and methods for quickly and easily getting information about, orincluded in, a paper document into a public or private digital page aredescribed. An example system includes a scanner that generates scaninformation of at least a portion of a paper document and a processingsystem in data communication with the scanner over a network. Theprocessing system stores digital content corresponding to a plurality ofpaper documents into a database, receives the generated scan informationfrom the scanner, searches the database to identify digital contentassociated with the received scan information, and sends at least one ofa portion of the identified digital content or an address associatedwith the identified digital content to a network-accessible locationbeing associated with the user, if results of that database search arepositive. In one embodiment, the network-accessible location is a weblog page associated with the user.

In another embodiment of the disclosed innovations, a blogger could scansome text from a paper document with a portable scanning device. Thescanning device could either perform optical character recognition (OCR)on the scanned image or transmit either the raw image data or apartially processed version of the image data to a computer for remoteOCR processing. The computer would submit at least a portion of the textto a search engine that would locate an electronic version of the paperdocument and return a hyperlink to the computer. The computer could thenmodify a predetermined blog page by adding an entry having at least aportion of the scanned text followed by a hyperlink to the electronicdocument. By optionally including a portion of the scanned text in theblog entry, the hyperlink can be put in context without requiring theblogger to type any explanatory text.

FIG. 1 illustrates a typical environment 100 in which some embodimentsof the system operate. As illustrated, the example operating environment100 includes a scanning device 102 (operative to graphically capture aportion of a document 104), a computer 106, wireless device 125, anaccount server 108 (having an account database), one or more documentservers 110 (having document databases 112), vendor server (having anitem database), aggregator server 160 (having an aggregator database165), all interconnected via a network such as the Internet 110.

The computer 106 may include a memory containing computer executableinstructions for processing an order request from scanning device 102 byobtaining an order. An example of an order could include an identifier(such as a serial number of the scanning device 102 or an identifierthat uniquely identifies the user of the scanner), scanning contextinformation and/or scanned information that serves as the basis for asearch of one or more document databases 112 to uniquely identify thedigital document corresponding to the document 104 being scanned. Thecomputer 106 also includes a processor and memory. In alternativeembodiments, operating environment 100 may include more or lesscomponents.

In other embodiments, the system 100 includes a wireless device 118, avendor server 120 and an aggregator server 126. The servers 120 and 126are coupled to each other via various sorts of networks (e.g., theInternet 130 or wireless network 132). Regardless of the manner by whichthe devices are coupled to each other, the scanning device 102, thecomputer 106, the wireless device 118, the account server 108, thedocument server 110, the vendor server 120 and the aggregator server 126may be operable in accordance with well-known commercial transaction andcommunication protocols. In various embodiments, the functions andcapabilities of the scanning device 102, the computer 106, and thewireless device 118 may be wholly or partially integrated into onedevice. Thus, the terms scanning device, computer and wireless device,could refer to the same device depending upon whether the deviceincorporates functions or capabilities corresponding to the roles of thescanning device 102, the computer 106 and the wireless device 118.

Additionally, in various embodiments, the computer 106 and the accountserver 108 may be wholly or partially integrated. Thus, the termscomputer and account server, as used herein, for the purpose of thisspecification, including the claims, shall be interpreted with themeaning of an appropriately equipped device, operating in accordancewith either a computer or an account server role.

In accordance with another embodiment, an operating environment 100includes a document server 110 that has speech recognition capabilities.In this environment, no scanning device 102 is required, and in lieu ofscanning a portion of a rendered document, the user reads aloud theportion of the document and the document server 110 performs speechrecognition of the spoken text portion to generate the search query tobe processed. For example, the user may place a telephone call fromwireless device 118 directly to an access number for document server110, and in response to automated prompts, reads aloud the portion ofthe rendered document. Because the ultimate task of the server is toidentify a document within its database corresponding to the spoken text(that may be assumed to occur within the known corpus of text within thedatabase), the task of correctly recognizing the spoken words is vastlyeasier than the task of correctly recognizing spoken text when no suchinformation is available to constrain the search domain. Thisspeech-recognition-based approach also has the advantage that it can beimplemented using the currently available technology infrastructure, anddoes not require a user base of individuals who possess a scanningdevice 102. Thus, in the following disclosure, functions described asbeing performed by a scanner can alternatively be performed using aspeech-recognition-based approach. The unique user identity associatedwith the scanner may equivalently be associated with, for example, acellular phone used to call document server 110.

FIG. 2 is a block diagram of an embodiment of a scanner 102. The scanner102 may include various means for ascertaining a context of a scan. InFIG. 2, the scanner 102 includes a scan port 150 to scan informationfrom rendered documents, and various environmental sensors. Theenvironmental sensors include one or more of a lens 156 (e.g. anaperture to a camera or light-sensitive device), a pixelator 160 toconvert visual information of the environment into machine-compatiblesignals, a microphone 162 to convert sound of the environment (includingspoken words) into machine-compatible signals, a Global PositioningSystem (GPS) 166 to provide a location function, and a tactile sensor170 to provide sensitivity to contact signals. The scanner 102 also mayinclude logic 172 to interact with the various sensors, possiblyprocessing the received signals into different formats and/orinterpretations. The logic 172 may be operable to fetch data and programinstructions stored in associated memory such as RAM, ROM, or othersuitable memory. The scanner 102 includes an interface 178 tocommunicate scanned information and environmental signals to a networkand/or an associated computing device.

As an example of one use of the scanner 102, a reader may scan some textfrom a newspaper article with scanner 102. The text is scanned as abit-mapped image via scan port 150. The logic 172 causes the bit-mappedimage to be stored in memory 180. The logic 172 may also perform opticalcharacter recognition (OCR) or other post-scan processing on thebit-mapped image to convert it to text or an intermediate form ofprocessed image data. The scanner 102 may then upload the bit-mappedimage (or text or processed image data, if post-scan processing has beenperformed by the logic 172) to the computer 106 via the interface 178.

The scanner 102 further includes a velocity sensor 182 to sense velocityaspects of a scan action (e.g. how fast and in what direction a scanaction occurs), an acceleration sensor 184 to detect accelerationaspects of a scan action, and a temperature sensor 188. Of course, notall scanner embodiments may include each of these features, and someembodiments may include additional features not found in the exemplaryembodiment.

FIG. 3 illustrates an example process 200 performed by the system shownin FIG. 1. First, at a block 202, a user scans a document using thescanning device 102. At a block 204, information generated by thescanner 102 from the block 202 is sent to the document server 110 orsome other searching system. Information uniquely identifying thescanner and/or the user (e.g., equipment serial numbers, billinginformation, subscription account number, etc.) is sent with the scannedinformation. At a block 206, the document server 110 or the searchingsystem performs a search of documents stored in the database 112 orstored in a database distributed across the network 130 at variouslocations. At a decision block 210, the system 200 determines if thereare results of the search. If there are no results of the performedsearch, then at a block 212, the document server 110 or the searchingsystem sends a message that is presented to a user at the scanningdevice 102 or at the computer 106 or wireless device 118. Thetransmitted message is an error message that indicates to the operatorof the scanning device 102 that the search based on the scan that theyperformed failed to identify at least one corresponding document in thedatabase.

If there are results from the search, then the process 200 determines ifthe results are to go to one or more of a public or private locationassociated with the operator of the scanning device 102, see decisionblock 216. The results of the search, whether they be a link to anetwork-based location of the found results (e.g., a hyperlink) or anactual document (or portion thereof) identified in the search, are sentto a public location, at a block 220. The public location associatedwith the user can be in a number of different formats. A web log (blog)is one example of a public location that receives the results of thesearch. In one embodiment, the blog is automated to automatically postthe results of the search in various formats. Blogs are described inmore detail below. If the results of the search determined at thedecision block 216 go to a private location, then a private locationassociated with the user receives the results of the search, see block218. Whether the search results go to a public or private locationassociated with the user, various information of the search and the usermay be recorded for later use, see block 224. The information recordedmay be used, for example, to establish or modify a ranking within thedocument database 112 of any of the information or associated documentsthat were scanned and searched, and also to provide various demographicinformation with regard to the searcher (e.g., location, age, sex, etc.)and the items scanned by the searcher (which may be used by the documentserver 110 to create other useful databases).

A blog is an online journal (the contraction of “web log”) or a website.A blog usually shows the following primary characteristics:

frequently updated with new content;

content unit is a “post” or an “entry”—it may not necessarily be textbut also pictures, sounds, videos, etc.;

posts are dated;

full posts or summaries are displayed on the blog home page with thelast or freshest ones on top—that posts are listed in reversechronologic order makes it very easy to see if a blog has been updatedrecently, or appears to be stalled, it is therefore an incentive (alongwith dated posts) for authors to publish frequently in order to keep thecontent fresh; and

posts are accessible through a permanent link and/or chronologicalarchives (daily/weekly/monthly, or a linear previous/next navigation).

A blog may show the following secondary characteristics, which are notnecessarily distinctive of blogs but are instrumental in their adoption:

the publication process is supported by a microcontent or personalpublishing system—the emergence of those free or cheap systems whichhelp people without knowledge of web technologies to easily publishcontent on the web has been the key factor in the spread of blogsoutside the web-savvy, geek community;

a news feed is available for use with a news aggregator;

visitors may comment on posts, with or without registration, and theircomments may appear publicly along with the post. At any time, a blogauthor may decide on a post-by-post basis if comments are allowed (mostblogs allow comments while most web sites do not). Because most blogsallow comments, a blog may provide a newsfeed that includes visitorcomments to improve interaction between publishers and their audience;

posts may be classified by categories;

each post may display a list of external links that point to it allowingreaders to discover more sources around a particular topic—techniquesknown as TrackBacks, Pingbacks and Referrer tracking allow for theautomatic creation of such back links between two websites;

display a list of other blogs (blogolist) and websites of interest—thisis a great way to discover new blogs and also gives a better idea of whoare the authors by seeing who they link to; and

each time a blog is updated, the blog may “ping” (i.e. signal to) aserver that indexes and publishes a list of recently updated blogs (e.g.daypop).

FIG. 4 illustrates an article that is presented in a publication that auser may scan. The scan of a portion of the text may record enoughinformation to perform an accurate search for an electronic copy of theassociated document that is stored in the document database 112 or at adatabase associated with a vendor (e.g., publisher). After the user hasscanned a portion of the document, the search is performed and theresults of the search are sent to either a private or public locationassociated with the user depending upon some preset criteria.

In another embodiment, the article includes an icon 300 or other type ofgraphic image or text that when highlighted by the scanner 102automatically sends any of the results of the search directly to theblog associated with the user. Of course, the control icon 300 does notnecessarily have to be on the paper document that includes the article.The control icon 300 may be scanned from any document. For example, theuser may carry a wallet-sized card with various control icons that theuser may scan to cause the system 100 to perform certain actions. Forexample, the user may scan a control icon 300 which causes the text fromthe next scan to be submitted to a search engine and the search resultsautomatically posted to a predetermined blog.

FIG. 5 illustrates a blog 320 associated with the user of the scanner.The results of the scanning of the article as shown in FIG. 4 may resultin a link to that article being automatically entered and stored intothe blog as a hyperlink 322. When a viewer selects the hyperlink 322,the digital article or a web page associated with the article ispresented. This may also occur if the user scanned the icon 300 from thepublication. Also, if the user has recorded his voice over themicrophone 162 of the scanner 102, and desires that this voice recordingbe accessible through his blog 320, an audio (e.g., voice) icon 326 ispresented on his blog 320. When a visitor viewing blog 320 activates thevoice icon 326, the user's previously stored voice recording is playedback to the visitor who activated the voice icon 326.

While the system has been illustrated and described, as noted above,many changes can be made without departing from its spirit and scope.Accordingly, the scope of the invention is not limited by suchillustration and description. Instead, the invention should bedetermined entirely by reference to the claims that follow.

1-19. (canceled)
 20. A method in a computing system for automaticallymodify a distinguished document, comprising: receiving a text stringrecognized in an image captured from a rendered document; identifying adocument containing the received text string; and modifying thedistinguished document to include a link to the identified document. 21.The method of claim 20 wherein the distinguished document is a web page.22. The method of claim 20 wherein the capture is performed by adistinguished user, the method further comprising identifying thedistinguished document as corresponding to the distinguished user. 23.The method of claim 20, further comprising modifying the distinguisheddocument to include a contiguous portion of text in the identifieddocument that contains the received text during.
 24. The method of claim20 wherein the link that the distinguished document is modified toinclude comprises a URL.
 25. The method of claim 20, further comprisinganalyzing the captured image to recognize the received text string. 26.The method of claim 25, further comprising capturing the image from therendered document.
 27. A computer-readable medium having contentscapable of causing a computing system to perform a method forautomatically modifying a distinguished document, the method comprising:receiving an image captured from a rendered document; on the basis ofthe received image, identifying a document corresponding to the rendereddocument from which the image was captured; and modifying thedistinguished document to include a link to the identified document. 28.The computer-readable medium of claim 27 wherein the distinguisheddocument is a web page.
 29. The computer-readable medium of claim 27wherein the capture is performed by a distinguished user, the methodfurther comprising identifying the distinguished document ascorresponding to the distinguished user.
 30. The computer-readablemedium of claim 27, the method further comprising modifying thedistinguished document to include a contiguous portion of text in theidentified document that is shown in the received image.
 31. Thecomputer-readable medium of claim 27 wherein the link that thedistinguished document is modified to include comprises a URL.
 32. Thecomputer-readable medium of claim 27, the method further comprisingcapturing the image from the rendered document.
 33. A computing systemfor automatically modifying a distinguished document, comprising: areceiver that receives an image captured from a rendered document; anidentification subsystem that, on the basis of the received image,identifies a document corresponding to the rendered document from whichthe image was captured; and a distinguished document modificationsubsystem that modifies the distinguished document to include a link tothe identified document.
 34. The computing system of claim 33 whereinthe distinguished document is a web page.
 35. The computing system ofclaim 33 wherein the capture is performed by a distinguished user,further comprising a distinguished document identification subsystemthat identifies the distinguished document as corresponding to thedistinguished user.
 36. The computing system of claim 33 wherein thedistinguished document modification subsystem further modifies thedistinguished document to include a contiguous portion of text in theidentified document that is shown in the received image.
 37. Thecomputing system of claim 33 wherein the link that the distinguisheddocument modification subsystem modifies the distinguished document toinclude comprises a URL.
 38. The computing system of claim 33, furthercomprises an image sensor that captures the image from the rendereddocument.